CombPDX: a unified statistical framework for evaluating drug synergism in patient-derived xenografts

Anticancer combination therapy has been developed to increase efficacy by enhancing synergy. Patient-derived xenografts (PDXs) have emerged as reliable preclinical models to develop effective treatments in translational cancer research. However, most PDX combination study designs focus on single dose levels, and dose–response surface models are not appropriate for testing synergism. We propose a comprehensive statistical framework to assess joint action of drug combinations from PDX tumor growth curve data. We provide various metrics and robust statistical inference procedures that locally (at a fixed time) and globally (across time) access combination effects under classical drug interaction models. Integrating genomic and pharmacological profiles in non-small-cell lung cancer (NSCLC), we have shown the utilities of combPDX in discovering effective therapeutic combinations and relevant biological mechanisms. We provide an interactive web server, combPDX (https://licaih.shinyapps.io/CombPDX/), to analyze PDX tumor growth curve data and perform power analyses.

the number of animals required per group. In the fixed dose experiment, the dose-response surface methods are not applicable, and the joint action of drugs should be evaluated at a fixed dose combination.
The statistical framework that assesses the joint action of drug combinations with fixed doses is not well developed. Wu et al. 22,23 proposed interaction indices and the statistical inferential procedure based on surviving fraction of cells and survival endpoints. Demidenko and Miller 24 proposed a log-linear model on the tumor volumes by assuming that they follow exponential growth. However, these are limited to the Bliss independence model. A distinctive set of mathematical definitions might lead to different quantifications of the degree of joint action.
In this article, we propose a comprehensive statistical framework to calculate combination indices (CIs) and the inferential procedures that are robust to potential outliers and errors in PDX experiments. We considered three most well-established reference models: HSA, response additivity (RA), and BI in a unified statistical quantification method. We present a user-friendly web server, combPDX, to compile the joint actions over time from PDX tumor growth data and to provide a power analysis tool to facilitate designs of PDX combination studies. Applying our methods to non-small-cell lung cancer (NSCLC), we show the utilities of our framework in finding underlying mechanisms of combination drug action using gene expression profiles for PDX models.

Results
Overview. The pipeline of combPDX includes three steps to assess combination effects as well as the power analysis procedure (Fig. 1). Longitudinal raw tumor volume measurements are collected from four treatment groups (C, A, B and AB) and the tumor growth curves are displayed (Fig. 1a). For each individual mouse, the response at each time point is determined by computing the relative tumor volume to adjust for heterogeneous initial tumor measurements across animals, and the missing relative tumor volumes at time t are interpolated using the neighboring measurements (Fig. 1b). In some PDX experiments, missing values may occur due to multiple inconsistent measurements or missing assessments at a given time point. At each time point, we determine the treatment effects of A, B, and AB compared to the control group C (Fig. 1c). Based on the treatment effects, we provide the CIs under HSA, RA, and BI, and the corresponding 95% confidence intervals (Fig. 1d). We finally implement a power analyses tool under the three reference models (Fig. 1e).
Simulation studies. We conducted a series of simulations to examine the performance of the proposed approaches. The expected tumor volumes under singe-agent treatment or control group were generated by the Gompertz tumor growth model over time 25,26 . Then the expected tumor volumes for combination drug were generated under each of the reference model (HSA, RA, and BI). The expected tumor growth curves for these simulation settings are shown in Fig. S1. For each setting, we generated 1000 replicate datasets with sample sizes of 5 or 10 per group. The detailed data generating process is described in Section S1. We evaluated the CIs with the confidence intervals at day 21 obtained from all the three reference models under each of the simulation scenarios. We compared the inferential methods with/without the bootstrap procedures based on 1,000 replications. The coverage probabilities are summarized in Table S1. When sample size is 5 in each group, the confidence intervals without the bootstrap procedure were slightly narrower than nominal level, resulting in inflated  www.nature.com/scientificreports/ type I error under the null hypothesis. When we have more sample size (n = 10) in each group, confidence intervals without bootstrap become close to the nominal coverage probability, and the confidence intervals with and without bootstrap tend to agree on each other. Overall, the bootstrap procedure helps in constructing more accurate confidence intervals when small sample size while both procedures provide valid statistical inferential performance when sample size becomes larger.
Evaluation of drug combinations of KRT232, navitoclax, and trametinib in NSCLC. We performed the analysis of a real study of the antitumor activity of the combination of KRT232, navitoclax, and trametinib using the PDX models for NSCLC, where a total of 28 PDX models were tested [27][28][29] . For each PDX tumor model, mice were randomized to the four treatment arms, and the tumor volume for each mouse was recorded every 2-3 days. 20 combination therapy experiments having sufficient samples sizes were selected (Section S2). All analyses were performed in R v4.0.2 30 . The treatment information for the selected experiments is summarized in Table S2.

Molecular biomarkers associated with CIs.
Although KRT232 showed no synergistic signal in combination with navitoclax and trametinib (Table S3), we systematically investigated pathway-level signatures of the combination action in a framework of pharmacogenomic analysis utilizing multiple PDX experiments performed for the combinations. We conducted gene set variation analysis (GSVA) 32 based on C2 collection of curated biological pathways as provided by the Broad Institute's collection 33 . The pathway enrichment score (ES) by GSVA provides single-number summaries of pathway activity for each sample and each pathway. The PDX www.nature.com/scientificreports/ models were profiled for their expression of 15,732 genes using RNA-seq after filtering out genes whose 75% percentile was less than 20. Pathways with less than 10 genes were excluded, which resulted in 5,164 pathways in total. We selected PDX models that had RNA-seq data available and fulfilled the Bliss assumption ( 0 ≤ δ g ≤ 1 ) across all time points. We came up with seven and nine PDXs in KRT232 plus navitoclax and KRT232 plus trametinib (Section S2). We performed distance correlation tests 34 for detecting linear/nonlinear associations between pathways and the combination indices (Section S2). In combination treatment KRT232 plus navitoclax in NSCLC, controlling FDR at 0.1, resulting in 136, 150, and 145 pathways were significantly associated with HSA, RA, and BI, respectively, with 118 intersecting pathways across the three reference models (Fig. 3a). Heatmap of those pathways associated with at least one of the CIs shows clear pattern of two clusters of PDX models (Fig. 3b). The top significant pathways include those related to the therapeutic target or the prognosis in NSCLC ( Fig. 3c and Data S1). For example, the p53 pathway interplays with MDM2 inhibitor KRT-232 to suppress tumor cell growth 27,35,36 . Moreover, IRAK and HIF-1 pathways are associated with the development of tumor in NSCLC [37][38][39] .
Similar analysis was performed in combination treatment KRT232 plus trametinib. Controlling FDR at 0.1, resulting in 27, 12, and 13 pathways were significantly associated with HSA, RA, and BI, respectively, with seven intersecting pathways across the three reference models ( Fig. S2a and Data S1). Heatmap of those 32 pathways also shows clear pattern of two clusters of PDXs (Fig. S2b). The seven intersecting pathways include those related to the therapeutic target in NSCLC. For example, several members in NFAT gene family differentially expressed in tumor vs. normal cells 43 , and FGFR3 is a potential therapeutic target in NSCLC 44,45 .

Discussion
In this article, we have proposed a comprehensive statistical framework to quantify the joint action of two drugs in standard in vivo combination experiments with fixed doses, where the dose-response models such as the Chou-Talalay method 46,47 and Isobologram 14 are not applicable. Our framework is generally applicable to tumor xenograft designs, including PDX experiments that is a newly designed novel model system for drug development and individualized treatment. The usual practice to decide combination effect in in vivo designs is based on p-values obtained from two-group comparisons of the combination group versus the control and Heatmap of baseline pathway enrichment score from GSVA with the row annotation to be gCI . Each column represents a PDX model, and each row represents a gene whose enrichment score was significantly correlated with one or more gCI using the distance correlation test. (c) Volcano plot showing that p-value versus distance correlation for BI (the direction of the correlation is obtained by Spearman correlation). Blue dots represent significant differential pathways and black dots represent insignificant pathways. Panel (a) was generated using the R package VennDiagram 40 (version 1.6.0 https:// cran.r-proje ct. org/ web/ packa ges/ VennD iagram/ index. html), Panel (b) was generated using the R package ComplexHeatmap 41 (version 2.6.2 https:// github. com/ joker goo/ Compl exHea tmap), and Panels (c) was generated using the R package EnhancedVolcano 42 (version 1.8.0 https:// github. com/ kevin blighe/ Enhan cedVo lcano). www.nature.com/scientificreports/ monotherapy groups. However, the p-value subthresholding approach does not directly quantify the magnitude of the combination effect that is useful for further statistical modeling in pharmacogenomic setting where the driving molecular mechanisms of variable drug responses are systematically studied. The combPDX web-application provides the visualization and analysis pipeline of the longitudinal tumor volume data at fixed dose levels, as well as power analysis tool to design in vivo combination experiments. Our framework is inspired by effect-based approaches that compare the effect from the combination of two drugs AB to the effects from its individual mono-drugs A and B, following the determination of efficacy of each treatment compared to the control. Various metrics have been suggested to summarize each individual growth curve to a value, e.g., the adjusted area under the tumor growth curve (aAUC) 48 , which can be employed in our CI calibrations based on the fact that the aAUC is interpreted as the mean tumor volume across time.
There is no global consensus on defining drug synergism/antagonism in the field. Different reference models may lead to inconsistent statistical significance of combination effects based on different underlying assumptions. We have extensively studied the differences of these three models in the mathematical formulations. With fixed group means of C, A, B, and AB, the HSA model always provides the highest CI value, while RA has the lowest CI in that it provides the most conservative procedure to declare a synergistic effect (see Section S3.6). The CI derived under HSA is formulated similarly to t-tests that compare tumor volume data of combination therapy group vs. the better monotherapy group although the bootstrap procedure of HSA is more robust than the t-test. Both methods have the limitation of not utilizing tumor volume measurements from control group. Thus, the synergism declared from these methods should be considered as the minimal evidence and used for drugs which mono-therapeutic effects have been proved sufficiently in the field. In a counter example of drug antagonism (Fig. S3) where the two drugs present a clear antagonistic pattern, HSA shows additivity because the tumor volume of the combination drug is close to that of trametinib, however, RA and BI, declared antagonism by adjusting the tumor volumes in control group (Table S3).
Using tumor volume data from NSCLC PDX experiments that evaluate two-drug combinations of KRT232, navitoclax, and trametinib, we have shown the utilities of calibrating CI s in finding underlying mechanisms of combination drug response based on gene expression data. We found that trametinib plus navitoclax had the synergistic effects in HSA and BI models, which is along with the currently undergoing clinical trials for the treatment of NSCLC 31 . Moreover, the integrative analysis of KRT-232 plus navitoclax pharmacological data with gene expression data provided highly concordant pathway signatures across the three reference interaction models. These pathways included major driving mechanisms of the combination therapy in NSCLC.
In summary, combPDX represents an important step towards combination effect calibration for PDX models. It provides comprehensive data analysis and result visualization for in vivo combination drug testing. Coupled with molecular profile, combPDX facilitates the discovery of new biomarkers for combination therapy. Going forward, the knowledge of the biological mechanisms will add promise to the identification of the optimal personalized treatment. Tumor volume data processing. The combPDX requires a long data matrix as an input where each of the tumor volume measures are stacked by rows with four columns: mice ID, treatment, day, and tumor volume. Table S4 presents the description of these metrics and Table S5 shows an example of input data. Due to the variation of initial tumor volumes across mice, the response for an individual animal at each time point is defined by the relative tumor volume, which is the raw tumor volume divided by the initial tumor volume of the mouse. We denote v t as the relative tumor volume for a mouse at time t (Section S3.1). Unfortunately, missing values may arise from data entry errors. For example, discrepant records may be entered for the same subjects or missing assessment for a subject at a given time point. For subjects with no tumor volume measurement at time t , but with flanking volume measurements at time t 0 and t 1 , we use linear interpolation to impute the relative tumor volume at time t:

Method
An alternative and likely more accurate interpolation approach is to model the tumor growth curves while accounting for the innate tumor environment. This is often infeasible because quantifying the innate tumor environment is often unavailable. If such information is available, the linear interpolation can be replaced, and then the treatment effect and combination index can be inferred based on the desired imputed values.
Missing value interpolations are used to maintain sample size by leveraging information from existing data points. In this manner, additional assumptions are imposed on the data analysis. For example, the innate tumor environment of a mouse is similar to that of its nearby time points, or the log tumor growth is at the same rate as its neighborhood. Therefore, care should be taken if an inference is carried out with multiple interpolations of missing values at a given time point. We recommend that no more than half of the values be interpolated data points within a subgroup at a given time for data analysis. www.nature.com/scientificreports/ Denote the relative tumor volume for an individual mouse in group g = C, A, B, AB as v g , which follows an independent identical distribution with mean µ g and variance σ 2 g . The mean and variance parameters µ g and σ 2 g can be estimated by the sample mean v g and sample variance σ 2 g . Our analytic inference procedure in this paper is based on the central limit theorem, v g follows normal distribution with mean µ g and variance σ 2 g /n g , where n g is the sample size in treatment group g (Section S3.1). Combined with Bootstrap procedures, we re-calibrate the null distribution of test statistics to propose a robust statistical framework to any deviations from the theoretical distribution.
Determination of treatment effect. To access the combination effect of two drugs, we take effect-based approaches that directly compare the effect of the combination to the effects of its individual components. The effect of a treatment group (A, B or AB) is defined by the antitumor activity compared with the control group (C). At a given time point, treatment effect for a group g is quantified by the mean reduction in the relative tumor volumes between treatment and control groups divided by the control mean A large δ g value indicates a strong treatment effect. For combination experiments, we expect that the mean relative tumor volumes of the treatment groups µ A , µ B and µ AB are less than the control tumor volume µ C , which results in δ A , δ B and δ AB located between 0 and 1. While this has been widely used as the tumor growth inhibition (TGI) with predetermined cutoffs of declaring an antitumor activity 49-51 , we incorporate statistical inferential procedures by constructing 95% confidence intervals. The lower bound of a one-sided 100(1-α)% confidence interval for a combination index can be calculated using the Delta method, , and z 1−α is the ( 1 − α)-th quantile of standard normal distribution (Section S3.2).

Combination index (CI).
Based on the treatment effects δ g evaluated for all three treatment groups A, B and AB, we aim to access the superiority of using drug combination AB to individual drugs A and B. Although there is no consensus on defining the synergistic action of two drugs 52 , we derive combination indices under the three popular reference models: (1) HSA ( CI HSA ), (2) RA (CI RA ), and (3) BI (CI BI ). All the CIs under the three models are calibrated to have the same implication: CI < 0, = 0, > 0 represent antagonistic, independent, and synergistic effects, respectively.
Highest Single Agent (HSA) [6][7][8]53 shows that the synergistic combination effect occurs when the combined effect is greater than the more effective individual component: max(δ A , δ B )/δ AB < 1 . Under the HSA, we derive CI with respect to group means: where g is chosen from A or B that has larger effect δ . If the two single agent effects are equal, δ A = δ B , we choose the one that has the narrower confidence interval evaluated. The CI HSA is independent of the control experiment.
Response Additive (RA) 53,54 assumes that the fixed-dose two-drug combination has a linear additive effect under independence. A combination drug is considered synergistic if it shows a more enhanced effect than the sum of the two monotherapies' effects: (δ A + δ B )/δ AB < 1 . The corresponding CI can be derived with respect to group means as: The Bliss Independence (BI) approach 6,8,13,14,53 is based the definition of independence on its probabilistic interpretation. Two drugs are independent if one drug's presence does not affect the probability of another drug's effect on tumor growth decay. Wu et al. 22 proposed an interaction index under such a definition using relative tumor volume. Assume that the treatment effects δ g are the outcomes of a probabilistic process such that 0 ≤ δ g ≤ 1 . Note that δ g in Eq. (1) takes a value between 0 and 1 if µ C > µ g for g = A, B, AB. Under BI, if the combination therapy is considered synergistic, we have (δ A + δ B − δ A δ B )/δ AB < 1 . The combination index is given by.
Calibration of confidence intervals for CIs. We develop statistical inferential procedures by deriving confidence intervals for the CIs under the asymptotic normality in addition to the bootstrap method. Using the Delta method, the standard errors of the indices are approximated by: . www.nature.com/scientificreports/ A two-sided 100(1 − α)% confidence interval is constructed as CI − z 1−α/2 se CI , CI + z 1−α/2 se CI (see Derivations in Sections S3. 3-3.5).
The standard intervals based on asymptotic approximation to normal distributions can be inaccurate in practice due to skewness and heavier tails of tumor volume data. A robust bootstrap procedure is provided to construct confidence intervals 55 . Bootstrap is a resampling method that samples the original data with replacements iteratively to estimate test statistics or null distributions. To calculate bootstrap-t interval at a given time point, we repeat the following steps B times. For a given reference model, HSA, RA or BI, at the iteration b, we (1) Sample a size of n g animals with replacement within each treatment arm g , and the corresponding tumor volume measurement for an animal is denoted by v * b g . (2) Calculate combination index CI * b based on the bootstrap samples from the step 1, v * b g for g = C, A, B, AB.
is calculated by the theoretical standard error calibrated.
The α-th percentile of Z *b is estimated by the value t (α) such that # Z * b ≤ t (α) /B = α . Finally, the bootstrap-t confidence interval is constructed by CI − t (1−α/2) se CI , CI − t (α/2) se CI . This bootstrap procedure adjusts for derivations from the asymptotic distribution of CI by recalibrating the percentiles.
Global assessment of combination effect. The CI values assess combination effects at a given time point. We extend the procedure to a global assessment for any given study intervals of interest. Given reference model, the global CI is defined as the average of the CIs within the study interval of interest for time points 1, . . . , T: where CI t is the combination index at time t.Due to correlations of tumor volume measurements within individual mouse, we conduct nested bootstrap procedure to construct a confidence interval for gCI without analytically specifying the variance var gCI 55 . The algorithm to construct confidence interval for gCI is similar to that in previous section, with an additional nested layer to estimate se * gCI * b . At the iteration b, we (1) Sample a size of n g animals with replacement within each treatment arm g , the corresponding growth curve data for an animal are denoted by v * b g = v * b g,1 , v * b g,2 , . . . , v * b g,T .
(2) Calculate combination index gCI * b based on the v * b g , g = C, A, B, AB. We repeat the following nested bootstrap procedure L = 25 times to estimate se * gCI * b , (a) Sample a size of n g growth curves with replacement from v * b g within each group and denote the corresponding growth curve data as v * b,l g = v * b,l g,1 , v * b,l g,2 , . . . , v * b,l g,T for each animal.
Power analysis. Based on the CIs under the HSA, BI and RA reference models, we provide power analysis tool to design PDX combination experiments. Under the null hypothesis of independent action of a drug combination, H 0 : CI = 0 , where the CI follows asymptotic normal distribution N(0, var(CI)) . Assume that under www.nature.com/scientificreports/ the alternative hypothesis H A : CI = γ , where CI follows normal distribution N(γ, var(CI)) . Therefore, with prespecified values µ g , σ g , n g for each group g , the power is calculated as where α , β are the desired Type I and Type II error rates. Given mean tumor volumes, Fig. 4a (Fig. S4) illustrates the minimum δ AB (maximum tumor volume µ AB ) having synergistic effect under each reference model given δ A and δ B , and we also mathematically compared CI HSA , CI RA and CI BI in Section S3.6. Given mean tumor volumes, we have 0 ≤ CI RA ≤ CI BI ≤ CI HSA , which implies that HSA model provides the most relaxed procedure while RA is the most conservative. Combined with the statistical inference procedure using the asymptotic normality, the power curves, Fig. 4b,c for sample sizes of 5 and 10, respectively indicate that RA (HSA) require the largest (smallest) effect sizes of the combination to achieve the same statistical power when the effect sizes of monotherapies and standard deviations are fixed.

Software.
We provide an interactive web server, combPDX (https:// licaih. shiny apps. io/ CombP DX/), to analyze PDX tumor growth curve data and perform power analyses. The combPDX currently includes five tabs: Upload Dataset, Visualize Results, Power calculation, Download Results and Batch Analysis (Figs. S5-S7). These allow the users to test treatment effect as well as combination effect, view output figures and tables, download result and run batch analysis. The tutorial is presented in Section S4.